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A complete self-control mechanism is proposed in the 
dynamics of neural networks through the introduction of a 
time-dependent threshold, determined in function of both the 
noise and the pattern activity in the network. Especially for 
sparsely coded models this mechanism is shown to consider- 
ably improve the storage capacity, the basins of attraction 
and the mutual information content of the network. 

PACS numbers: 87.10-l-e, 64.60Cn 

Sparsely coded models have attracted a lot of attention 
in the development of neural networks, both from the 
device oriented and biologically oriented point of view 
0- iQ. It is well-known that they have a large storage 
capacity, which behaves as l/(alna) for a small where 
a is the pattern activity. However, it is clear that the 
basins of attraction, e.g., should not become too small 
because then sparse coding is, in fact, useless. 

In this context the necessity of an activity control sys- 
tem has been emphasized, which tries to keep the activity 
of the network in the retrieval process the same as the 
one for the memorized patterns j^- This has led to 
several discussions imposing external constraints on the 
dynamics (see the references in |^). Clearly, the enforce- 
ment of such a constraint at every time step destroys part 
of the autonomous functioning of the network. 

An important question is then whether the capacity 
of storage and retrieval with non-negligible basins of at- 
traction can be improved and even be optimized without 
imposing these external constraints, keeping at the same 
time the simplicity of the architecture of the network. 

In this Letter we answer this question by propos- 
ing, as far as we are aware for the first time, a com- 
plete self-control mechanism in the dynamics of neural 
networks. This is done through the introduction of a 
time-dependent threshold in the transfer function. This 
threshold is chosen as a function of the noise in the sys- 
tem and the pattern activity, and adapts itself in the 
course of the time evolution. The difference with existing 
results in the literature Q precisely lies in this adaptiv- 
ity property. This immediately solves, e.g., the difhcult 
problem of finding the mostly narrow interval for an op- 
timal threshold such that the basins of attraction of the 
memorized patterns do not shrink to zero. 

We have worked out the practical case of sparsely 
coded models. We find that the storage capacity, the 
basins of attraction as well as the mutual information 
content are improved. These results are shown to be 



valid also for not so sparse models. Indeed, a similar self- 
control mechanism should even work in more complicated 
architectures, e.g., layered and fully connected ones. Fur- 
thermore, this idea of self-control might be relevant for 
dynamical systems in general, when trying to improve 
the basins of attraction and the convergence times. 
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FIG. 1. The information i as a function of 6 without 
self-control for a — 0.1 (top) and a = 0.001 (bottom) for 
several values of a. 
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FIG. 2. The evolution of the overlap mt for several initial 
values mo, with go ~ 0.01 = a and a = 4 for the self-control 
model (right) and the optimal threshold model (left). The 
dashed curves are a guide to the eye. 

Consider a network of N binary neurons. At time t 
and zero temperature the neurons {cr.i^t} G {0, 1}, i — 
1, . . . , N are updated in parallel according to the rule 
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= Fe^{h^^t), hi^t = ^ Jioi^],t - a) . (1) 

In general, the input-output relation Fg^ can be a mono- 
tonic function with 6t a time-dependent threshold. In 
the sequel we restrict ourselves to the step function 
(a;) — Q{x — 9t). The quantity hi^t is the local field of 
neuron i at time t and a is the activity of the stored pat- 
terns, e {0, 1} fi = 1, . . . ,p. The latter are indepen- 
dent identically distributed random variables (IIDRV) 
with respect to i and /i determined by the probability 
distribution 

MCr) = «'^(ef-l) + (l-a)<5(^r)- (2) 

At this point we remark that the activity can be writ- 
ten as a = (1 ^ b)/2 with — 1 < 6 < 1 the bias of 
the patterns as defined, e.g., in In fact, (^f) = a 
but no correlations between the patterns occur, i.e., 
i^iii) ~ — 0- We now consider an extremely 

diluted asymmetric version of this model in which each 
neuron is connected, on average, with C other neurons. 
In that case the synaptic couplings Jy are determined by 
the covariance rule 

J^J = ^j2(^t-^m-'^)^ a ^ ail -a). (3) 

Here the Cij E {0, 1} are IIDRV with probability 
Pr{Cy = 1} = C/N < 1,C > 0. For a = 1/2 and 
6*4 = we recover the diluted Hopfield model. 

The relevant order parameters measuring the quality 
of retrieval are the overlap of the microscopic state of the 
network and the fith pattern, and the neural activity 

i i 

The ^ are normalized order parameters within the in- 
terval [—1,1], which attain the maximal value t ~ ^■ 
They have to be considered over the diluted structure 
such that the loading a is defined by p = aC. The Ham- 
ming distance between the state of the neuron and the 
pattern ^'^ can be written as c?^ = a — 2am^ ^ -I- qN,t- 

To fix the ideas and without loss of generality, we take 
an initial network configuration correlated with only one 
pattern meaning that only the retrieval overlap for that 
pattern, say /i = 1, is macroscopic, i.e., of order 0(1) 
in the thermodynamic limit C, N — + oo. The rest of 
the patterns causes a residual noise at each time step 
of the dynamics. Depending on the architecture of the 
network this noise might be extremely difficult to treat 
A novel idea is then to let the network itself au- 
tonomously counter this residual noise at each step of the 
dynamical evolution, by introducing an adaptive, hence 
time-dependent, threshold. We propose the general form 



9t{x) = c(a)[Var(u;t)]-'^/^ with uit this residual noise. This 
self-control mechanism of the network is complete if we 
find a way to determine c(a). 
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FIG. 3. The basin of attraction as a function of a for 
a = 0.01 and initial qo = a for the self-controlled model (full 
line) and the optimal threshold model (dashed line). 
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FIG. 4. The information i as a function of Mo for 
a — 0.3, a = 0.4 with (c = 1) and without self-control. 
Left: analytic results. Right: simulations up to i = 8 for 
A'' = 47.000, C = 50,p = 20 averaged over 10 samples. 

In order to do so we first write down the evolution 
equations governing the dynamics. We recall that for 
the particular model we are considering the parallel dy- 
namics can be solved exactly following the methods in- 
volving a signal-to- noise analysis (see, e.g, 0, [f)). Such 
an approach leads to the following equations for the order 
parameters in the thermodynamic limit C,N oo 

ml+,^{FgAil-a)Ml+uJt])^ (5) 
qt+i = aml_^_^ + (1 - a){Fg^{-aMf + ujt))u , (6) 

with M} — {ra\ — qt)/{l — a), where we have averaged 
over the first pattern and where the angular brackets 
indicate that we still have to average over the residual 
noise which can be written as u)t = [aQt]^/^A/'(0, 1) 
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with Qt = {1 — 2a)qt + a?' and A/'(0, 1) a Gaussian ran- 
dom variable with mean zero and variance unity. The 
order parameters m\ and qt are the thermodynamic hm- 
its of (|^) . The quantity Ml reduces to the overlap of the 
Hopfield model, again when taking a = 1/2 and 9t — 
From now on we forget about the superscript 1. 

The equations (||) and (^) give a self-controlled dynam- 
ics if we can completely specify, a priori, the threshold 9t 
proposed before. We remark that, for the present model, 
9t is a macroscopic parameter, thus no average must be 
done over the microscopic random variables at each time 
step t. We have, therefore, a mapping with a threshold 
with changes each time step but no statistical history ef- 
fects the evolution process. What is left then is to find 
an optimal form for c(a). 

A very intuitive reasoning based on the detailed be- 
havior of these equations goes as follows. To have 
TO ~ 1 — erfc(n) and q ^ a -\- erfc(n) with n > at 
a given time t such that good retrieval properties, i.e., 
fi = for most i are realized, we want the following 
inequalities to be satisfied: (1 — a)Mt — n[aQtY/'^ ^ 
and —aMt + n[Q;Q(]^/^ < Ot- Using the general form 
for the threshold 9t we obtain that 2n ^ Mt[aQt]~^^'^ ■ 
This leads to c(a) ~ n{l — 2a). Here we remark that 
n itself depends on a in the sense that for increasing a 
it gets more difficult to have good retrieval such that n 
decreases. But it can still be chosen a priori. 

In the limit of sparse coding meaning that the fraction 
of active neurons is very small and tends to zero in the 
thermodynamic limit, wc can present a more refined re- 
sult for c(a) by rewriting the second term on the r.h.s. of 
Eq. asymptotically as 

([F.I-aM + = -II -crt,-^ + -^,l^--^ 

This term must vanish faster than a so that we obtain c = 
[— 2 ln(a)]^/^. Using this and the first inequality written 
down above we can evaluate the maximal capacity for 
which some small errors in the retrieval are allowed. The 
result is a = 0(|aln(a)|~'^), which is of the same order as 
the critical capacity found for non-self-controlled sparsely 
coded neural networks [|, §],[§- fill- 
Next, it is known that while the Hamming distance is a 
good measure for the performance of a uniform network 
(i.e., a ~ 1/2), it does not give a complete description 
of the information content for sparsely coded networks. 
In more detail, it can not distinguish between a situation 
where most of the wrong neurons (tXi 7^ Ci) ^.re turned off 
and a situation where these wrong neurons are turned on. 
This distinction is extremely critical because the inactive 
neurons carry less information than the active ones. To 
give one example, when ai = for all i, the Hamming 
distance d = a and hence vanishes in the sparsely coded 
limit, while for ai — 1 for all i, d — 1 — a and hence 
goes to 1. However, in both cases there is no informa- 



tion transmitted. To solve this problem we introduce the 
mutual information content of the network. 
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FIG. 5. The information i as a function of a for the 
self-controlled model with several values of a. 
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FIG. 6. The maximal information imai/ ln(2) and 
cimaxa\ ln(a)| as a function of | hi(a)|. 

The mutual information function (see, e.g., [Q) is a 
concept in information theory which measures the aver- 
age amount of information that can be received by the 
user by observing the signal at the output of a channel. 
For the problem at hand, i.e. retrieval dynamics of the 
pattern /i = 1, where each time step is regarded as a 
channel it can be defined as (we forget about the time 
index i) 

I{a^,ii) = S(ai)-{S{am)i.. (7) 
5(a,) = -^p(a,)ln[p(a,)], (8) 

S{ad,i^^-Y,v{o.Aii)Hv{od,i,)\. (9) 

Here S(ai) and S(ai\^i) are the entropy and the condi- 
tional entropy of the output, respectively. The quantity 
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p((Ti|^i) is the conditional probability that the ith neuron 
is in a state at at time t, given that the ith site of the 
pattern being retrieved is ^i. It is given by 



q — am 



70 



1 



70 - (m - 7o)^](5^, 
(10) 



where we have assumed that this formula holds for every 
site index i, and where the m and q are precisely the order 
parameters ^ in the thermodynamic limit. We have also 
used the normahzations ^^p(cr|l) = J^aPi'^l^) ~ ^- Us- 
ing the probability distribution of the patterns (Eq.(|[)), 
we furthermore obtain 

p(a) = ^MOM^IO = qS{<7 - 1) + (1 - q)S{a). (11) 

Hence the expressions for the entropies defined above be- 
come 

Sia) = -qlliq-il-q)Hl-q), (12) 
('S'(cr|C))c = -a[m \n{m) + (1 - m) ln(l - m)] - 

(1 - a) [70 In 70 + (1 - 70) ln(l - 7o)]. (13) 

Recalling eq. (0) this completes the calculation of the 
mutual information content of the present model. 

We have solved this self-controlled dynamics for the 
sparsely coded network numerically and compared its re- 
trieval properties with non-self-controlled models. We 
are only interested in the retrieval solutions leading to 
M > and carrying a non-zero information /. 

In Fig. 1 we have plotted the information content i = 
pNI/^J = Of J as a function of the threshold 9 for a = 
0.1 and a = 0.001 and different values of a, without 
self-control. This illustrates that it is rather difficult, 
especially for sparse coding, to choose a threshold interval 
such that i is non-zero. 

In Fig. 2 we compare the time evolution of the re- 
trieval overlap, m^, starting from several initial values, 
rriQ, for the self-control model with an initial neural ac- 
tivity qo = 0.01 = a and 9sc — [—2(ln a)aQt]^^^: with 
the model where the threshold is chosen by hand in an 
optimal way in the sense that we took the one with the 
greatest information content i, by looking at the corre- 
sponding results of Fig. 1 for a — 0.01. We see that 
the self-control forces more of the overlap trajectories to 
go to the retrieval attractor. It does improve substan- 
tially the basin of attraction. This is further illustrated 
in Fig. 3 where the basin of attraction for the whole re- 
trieval phase R is shown for the model with a 9opt selected 
for every loading a and the model with self-control 9sc- 
We remark that even near the border of critical storage 
the results are still improved. Hence the storage capacity 
itself is also larger. These results are not strongly depen- 
dent upon the initial value of go as long as qo — 0{a). 



Furthermore, we find that self-control gives a compara- 
ble improvement for not so sparse models, e.g., a 0.3. 
This is illustrated in Fig. 4 where we show some ana- 
lytic results together with a first set of simulations for 
the basins of attraction. This type of simulations for ex- 
tremely diluted models is known to be difficult because 
of the theoretical limits C, N ^ 00 and InC <C InA^. 
Nevertheless, it is clear that concerning the self-control 
aspect, qualitative agreement with the analytic results is 
obtained. For these values of a, Mq is the relevant quan- 
tity. The quantitative difference is mostly due to the fact 
that Mq"""' ~ Mq^™'"' + 0(1 /VCa). 

Figure 5 displays the information i as a function of 
a for the self-controlled model with several values of a. 
We observe that imax = iictmax) is reached somewhat 
before the critical capacity and that it slowly increases 
with increasing a. 

Finally, in Fig. 6 we have plotted ima2;/lu(2) and 
ci^ma£co| ln(a)| as a function of the activity on a logarith- 
mic scale. It shows that imax increases with | ln(a)| until 
it starts to saturate. The saturation is rather slow, in 
agreement with results found in the literature [0, | |ll| ]. 

In conclusion, we have found a novel way to let a di- 
luted network autonomously control its dynamics such 
that the basins of attraction and the mutual information 
content are maximal. 
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